
WWWStat4Mac. The ideal companion to httpd4Mac,WebStar, or a UNIX or PC Webserver.
"One of the most feature packed log analysers available on the internet today"
WWWStat4Mac Reference
WWWStat4Mac Options
Below is a list of all of the options available in version 1.3 of WWWStat4Mac. These options don't have to come in this order in the preferences file. In fact they can come in any order that you like. For convenience they are split into separate catorgories.
- Basic Options
- Fine Tuning.
- General Notes.
What is explained in this section.
This section tries to expand on the documentation that comes 'built-in' to the preferences file. The action of most of the options is fairly obvious from their name. However, some of them need a little more explanation as the consequences of their use are not always immediately apparent.
Definition of an option.
Most of the options are made up of two parts. The option itself followed by an argument. Some options have an argument that is simply either on or off for example auto_save off
. However, others can have a string that could have any format for example file_name any-name-that-takes-your-fancy
. The most important rule to remember is that both the option and the argument cannot contain any spaces CRs, TABs or any other non-readable characters.
- registration_code TYPE_CODE_HERE!
If you've been good and sent in your registration fee, then you will have got a message from kagi telling you how to register the program. All you need to do is replace the string TYPE_CODE_HERE! with the registration code sent to you by Kagi Software.
- file_name
Use this option to specify a file name for the output file.
Example: file_name statistics.html
This name is used as the prompt if you are saving the file manually or is the name used if the file is saved automatically. Note: Wild cards are allowed at the start of the name so that if you have a log called myLog and you choose a file_name of *.stat.html then a logfile called myLog.stat.html will be created. This lets you drop multiple files onto WWWStat4Mac and have the output for each save in a separate file.
- auto_save
Use this option to specify whether you want to save your files automatically or not.
Example: auto_save off
- file_creator
Use this option to specify the creator of the output files.
Example file_creator R*ch for BBEdit.
- exclude_errors
Use this option to specify whether any errors are counted or not.For example if a file doesn't exist and the request fails then we don't add it to the list of files requested. Note. However that the site that tried to access the file will be included.
Example: exclude_errors on
- max_sites
- max_files
Use these options to specify how many sites and files you want to keep track of. If you have 200 files on your site then set max_files to 200 or a number slightly higher to allow for possible future changes. For example if you run your stats for a log that covers a long time then the names of files may change but the old names will still be in the log.This in effect will give you more files. During processing if you see that a message in the processing window shows the number of files found is equal to max_files then it means that you have probably got entries in the log for more than max_files files.
Example: max_sites 2000.
There is an ultimate limit of 100 000 sites and 20 000 files!. This limit of course is restricted in real terms by the amount of memory available to WWWStat4Mac. If you increase these numbers then you will need to increase the memory allocation of WWWStat4Mac.
- create_cache
Use this option to create a cache from the data just analysed. When you make a cache file then the next time the same log file is analysed all of the data already in the cache can be used to save processing time. Then only the new information in the log has to be added to the cache. Note if you change your options then the cache might become invalid. This is most important if you changed the file or site filters
Example: create_cache on.
- The output file will always contain, as a bare minimum, a summary of the server traffic.This will include the no.hits, no. sites, no. files and total bytes transferred. However the next 4 options can be added as needed. All are either on or off.
- site_stats
- file_stats
- robot_stats
- domain_stats
Note:The domain list came from the following URL: http://www.ics.uci.edu/WebSoft/wwwstat/country-codes.txt.
If you need to make additions to the file these can be added in the STR# resource ID 129.
- dns_cache
This option allows you to make a cache of any DNS names that your server might look up while it is processing. It allows you to make a local domain name cache so that you can resolve domain names without having to look them up again. If you don't want to have a cache set this option to zero. A good idea is to process the log without a domain cache and see how many unresolved hits there are by looking at the domain stats. Then increase the cache to about half of this. There will nearly always be some sites that cannot be resolved.
- traffic_stats
This option tells WWWStat4Mac to compile traffic stats for your server.This allows you to see when its at it's busiest,which month had the most accesses, which day of the week sees the most hits, who is the biggest user etc. If you want to collect statistics for the amount of traffic that you server is getting then you can request data covering several time spans. You can look at the data by Hour of Day, Day of Week,Day of Year, Week of Year, and by Month of Year. You can choose to have some or all of these options on at the same time.
Examples traffic_stats hourly
Examples traffic_stats daily
Examples traffic_stats weekly
Examples traffic_stats monthly
- traffic_display
This option allows you to control which aspects of the traffic stats you want to save. You can choose to see the number of bytes transferred or the number of hits, for both files and sites. You have three options bytes,hits or both.
Example traffic_display both
- top_sites
- top_files
- top_domains
These three option let you decide how many sites,files and domains do you actually want to show in your stats output.
NOTE.If you want show all of the sites/files/domains then make these values 0.
NOTE. If you have the output as tables (see later) then I suggest that you limit these values to 50 or less, otherwise browsers will require large (8M+) amounts of memory to display these tables. This is a limitation of the Web browser and the way that it displays tables not WWWStat4Mac. WWWStat4Mac could easily make you a table with 100,000 entries, but Netscape would never be able to draw it.
Example top_sites 25
- sort_sites
Use this option to specify how to sort the SITE data that is going into the stats. There are five options [name/hits/bytes/date/visits].See later for info on how to specify a 'visit'.If you choose sorting by name then the sites are sorted alphabetically.If you choose sorting by hits,bytes or visits then they are sorted in descending numerical order. Sorting by date gives the sites in reverse chronological order. So the sites that accessed your server most recently come first. See later for new options that enhance the table output.
NOTE. When sorting sites by name any sites that are unresolved, eg. 128.1.3.4, will appear at the top of the list since numbers have a higher precedence than letters.
Example sort_sites hits
- sort_files
Use this option to specify how to sort the FILE data that is going into the stats. You have five options [name/hits/bytes/date/size].If you choose sorting by name then the files are sorted alphabetically.If you choose sorting by hits,bytes or size then they are sorted in descending numerical order. Sorting by date gives the files in reverse chronological order. So the files that were accessed most recently come first. See later for new options that enhance the table output.
Example sort_files hits
- sort_domains
Use this option to specify how to sort the DOMAIN data that is going into the stats. You have four options [name/hits/bytes/date].If you choose sorting by name then the files are sorted alphabetically.If you choose sorting by hits or bytes then they are sorted in descending numerical order. Sorting by date gives the domains in reverse chronological order. So domains that have site that accessed files most recently come first.
Example sort_domains hits
- visit_time
More often than not a site will access the server and retrieve a collection of files. This set of files can be thought of as a single "visit" to the server. To define each "visit" we need to set a time between hits that we would expect to to occur in a single visit. So if you are serving large text files that need a while to read then set your visit time to be reasonably high. If on the other hand you are serving small text files or small graphics then you might
want a small visit time. The time is defined in seconds. The default time is 300s (ie 5 mins).
Example visit_time 600
- resolve_names
- resolve_time
Version 1.1 added the ability to resolve names using the domain name system. This gives you the opportunity to resolve names that were not resolved by the web-server either because the DNS was down, too slow, or turned off. You need to do a few things to get this working.
- Set the resolve_names option to on.
- Next choose a time that you are prepared to wait for each name to be resolved. Set the resolve_time to a value in seconds that you wish to wait for the DNS to return. If you set the time to 0 then the application will wait until the DNS returns or fails to find a name. Be warned however that this could take up to 120 seconds per item!!!!
- Make sure that you have MacTCP or Open Transport installed and configured to use a name server. If you don't know how to do this you should consult your network manager.
Example resolve_names on
Example resolve_time 10
- active_links
This option allows you to have 'active' links in the output file created by WWWStat4Mac. This means that a user looking at the statistics file can go directly to a file just by clicking on it.This means that someone can look at the stats file and then jump to the most popular pages directly.
Example active_links on
- create_tables
The normal output from WWWStat4Mac is a standard text file. However if you want your output to consist of stunning tables then turn this option on:-) Of course you and your users will need a browser capable of displaying tables.Eg. Netscape Navigator or Microsoft's Internet Explorer.
Example create_tables on
- Extra Table options
New in version 1.4 are the following options for enhancing the content of your tables.By combining these options with the sorting by date options you can show the last sites to access your pages or the last files accessed.
- include_date_in_table
This option lets you include in the table the date that files or sites were last accessed.
- include_time_in_table
This option lets you include in the table the time that files or sites were last accessed.
Example include_date_in_table on
- completion_sound
This option lets you set an audible alert when the processing has finished. This is either on or off.
Example completion_sound off
- filter_cgi_hits
When a server is running a CGI application some of the hits appearing in the log will contain many options. It is often preferable to group all of these CGI hits together. This option will look at the file name and truncate it at the '?' symbol. This option will also filter other hits including image map hits.
In fact anything with a ? in it :-)
Example filter_cgi_hits on
- map_default_to_file
When a request is made to the server and no filename is specified then this appears in the log as a request for '/'. However this is the same as requesting the servers default file. With this option we can make hits on '/' look like hits on the default file. The default filename is specified as the argument to
this option. This replaces an older option called map_default_to_home since not all servers have home.html as the default file. If you have an old preferences file which still contains the old option then WWWStat4Mac will automatically calls the map_default_to_file option with the argument home.html.In version 1.2 this option was extended to allow hits on folders to be mapped to a file too.
For example
/foo/bar/ appears in the log as /foo/bar/home.html
when the option is set like this
Example map_default_to_file home.html
NOTE. The file is defined without the preceding '/' otherwise a hit on '/' would appear as a hit on '//home.html' in the log.
Use these option to 'fine tune' the output from you analysis. These options can be used to enhance the basic information that you get from your log and also to filter out information that you don't want to appear in the final statistics.
- site_inclusion
- file_inclusion
Inclusions allows you to define up to 50 sites and 50 files that you would like to explicitly include in the output. You might want to do this because you only want to look at a portion of your statistics file.
The use of wildcards are also supported at either the beginning or end of a string, but not in the middle, by using a star '*' for the extra characters.
You can later exclude any files that you have chose to included, but that you don't want to show in the final result. For example if you wanted to include files in a directory called /commercial that had been accessed from universities in the United Kingdom you would have something like this.
Example file_inclusion /commercial*
Example site_inclusion *.ac.uk
NOTE. ANY SITE OR FILE THAT IS NOT INCLUDED WILL BE EXCLUDED.
- site_exclusion
- file_exclusion
Exclusions allows you to define up to 50 sites and 50 files that you would like to explicitly exclude from the output. You might want to do this because you don't want to see hits on graphics files or you want to hide sensitive files from the log. The use of wildcards are supported at either the beginning or end of the string, but not in the middle, by using a star '*' for the extra characters.
You can exclude any files or sites that you have already chosen to include, but that you don't want to show in the final result. For example if you wanted to exclude files in a directory called /commercial/cars/ and all gif files you would have something like this.
Example file_exclusion /commercial/cars/*
Example file_exclusion *.gif
If you wanted to exclude all hits from AltaVista's search robot you would do this.
Example site_exclusion scooter.pa-x.dec.com.
- site_and_file
If you want to exclude not only the site but also the files that it is accessing then set the site_and_file option to on. Otherwise the site itself will be excluded but the file that was accessed will still be counted. For example if AltaVista's robot 'scooter' accessed a file called home.html and you wanted to exclude all traffic from this robot you would set the following options.
Example site_exclusion scooter.pa-x.dec.com.
Example site_and_file on
- file_and_site
If you want to exclude a file and also the site that was accessing it then
set the file_and_site option to on. Otherwise the file will be excluded but the site that was accessed will still be counted. For example if you have two special secret sites called secret.mydomain.com and secret2.mydomain.com and they access a file called top_secret.html and you want neither the file nor the sites to appear in the log you can use the following options.
Example file_exclusion top_secret.html
Example file_and_site on
This is equivalent to setting the following options.
Example file_exclusion top_secret.html
Example site_exclusion secret1.mydomain.com
Example site_exclusion secret1.mydomain.com
As you can see the first method saves you having to explicitly exclude each site that might access the file top_secret.html.
NOTE
Each line of the log contains a site and a file. We have to apply the inclusion and exclusion options described above to both the site and the file using the following rules.
- First check for site inclusions. If a site is included or there are no site_inclusions then continue, otherwise go to stage 4.
- If the site is included then check to see if it is excluded. If it is not excluded then go to stage 4.
- If the site_and_file option is on then exclude the because the site is excluded.
- Check to see if the file is in the included list. If the file is included or there are no file inclusions, and it was not excluded in stage 3 then go to stage 5.
- If the file is included then check to see if it is in the file exclusions list. If it is not excluded then process it.
- If the file is excluded and the file_and_site is on then we must also exclude the site even if it was previously included.
- new_domain
If you want to define your own custom domains then these can be specified by using the new_domain option followed by the partial string that needs matching. It is important that the whole argument string is in a single string. So for example
Example new_domain UK_Commercial.co.uk
Example new_domain Demon_UK.demon.co.uk
Example new_domain Manchester_Univ.man.ac.uk
Example new_domain Digital_Computer.dec.com
Example new_domain IBM_Computer.ibm.com
Example new_domain Apple_Computer.apple.com
| - All One String - |
In the last example if a site name ends in .apple.com. then this would be added as a hit to the custom domain called Apple_Computer. It would also be added to the standard domain '.com'
It is also possible to use this option to alias machines to simple names. For example
Example new_domain Peter_Hardman.sodium.ch.man.ac.uk
You can extend this to collect several machines together into a single group. For example.
Example new_domain Machines_I_Use.sodium.ch.man.ac.uk
Example new_domain Machines_I_Use.silver.ch.man.ac.uk
Example new_domain Machines_I_Use.xserv1.dl.ac.uk
Example new_domain Machines_I_Use.sunserver.ssci.liv.ac.uk
This maps any hits from the various machines to the custom domain Machines_I_Use. There is currently a limit of 40 custom domains, but this should be more than enough. If you require more then send me an email.
NOTE. This option is different from the site_alias option described later as it doesn't alter the name of the site.
- prefix_file
- suffix_file
If you want to include a custom header or footer automatically to your outputted log file you can define them here. For example you could add your company logo at the top and add a button bar at the bottom of the file. To do this you need to make files called 'WWWStat4Mac Prefix' and 'WWWStat4Mac Suffix' and put them in the preferences folder.
Then enable the options as follows
Example prefix_file on
Example suffix_file on
- site_alias
- file_alias
Version 1.3 introduces the concept of aliases. For example if you have a file that you want to make look like some another file or there is a site that uses dynamic names allocation and you want to group together all hits from this site then these options let you do it.
For example if you want to make the sites host123.aol.com and host321.aol.com appear in the log as simply aol.com then use this option.
Example site_alias *.aol.com aol.com
If you want to group together all of the hits on GIF files so that they appear as a generic group you can do this.
Example file_alias *.gif gif_files
NOTE. Unlike all of the other options this one uses TWO arguments. The first is what you want to alias and the second is what it will become.
IMPORTANT. The alias option is a very useful feature but what it does is really very simple. It changes the name of a site or file to that of another site or file. When you change the name of a site it then gets treated as if it was that site. Because of the way that the alias process works if a file could have two possible aliases only the first will be used. For example if you have these two options
file_alias *.gif gif_files
file_alias foo.* foo_files
and there is a file called foo.gif then the first alias option will change the name of the file from foo.gif to gif_files and then exit. The second alias option will never be considered.
- file_type
NEW IN VERSION 1.4.1
This option is similar in many ways to the file_alias option described above except that it doesn't alter the name of the file in the output. You use the file type options when you want to keep track of specific files types. This option lets you track how many unique files of each type are served, and the total number of files of that type served. The option consists of two parts. The first is the type that you want to match. This can contain wildcards at either end of the type (but not both). The second is a string giving a description of the type. If you do not wish to enter a description then you can leave this field blank. You can have upto 50 different file types defined.
Example file_type *.html HTML_Documents
Example file_type *.hqx Encoded_Macintosh_Files
You don't have to limit yourself just to normal suffixes either, you can count stats for directories for example.
Example file_type /pages/wwwstat4mac/* WWWStat4Mac_Docs.
When you have this option entered an extra table is produced in the output after the initial information with a line for each type.
The use of wildcards are also supported at either the beginning or end of strings, but not in the middle, by using a star '*' for the extra characters. The options that will accept wild cards are
Any strings that are used to specify files or site names are not case sensitive. All strings are first converted to lowercase by the program before any comparisons.
This file last modified on 2-Nov-96 at 2:46 pm by Peter Hardman.